52 research outputs found
SAWdoubler: a program for counting self-avoiding walks
This article presents SAWdoubler, a package for counting the total number
Z(N) of self-avoiding walks (SAWs) on a regular lattice by the length-doubling
method, of which the basic concept has been published previously by us. We
discuss an algorithm for the creation of all SAWs of length N, efficient
storage of these SAWs in a tree data structure, and an algorithm for the
computation of correction terms to the count Z(2N) for SAWs of double length,
removing all combinations of two intersecting single-length SAWs.
We present an efficient numbering of the lattice sites that enables
exploitation of symmetry and leads to a smaller tree data structure; this
numbering is by increasing Euclidean distance from the origin of the lattice.
Furthermore, we show how the computation can be parallelised by distributing
the iterations of the main loop of the algorithm over the cores of a multicore
architecture. Experimental results on the 3D cubic lattice demonstrate that
Z(28) can be computed on a dual-core PC in only 1 hour and 40 minutes, with a
speedup of 1.56 compared to the single-core computation and with a gain by
using symmetry of a factor of 26. We present results for memory use and show
how the computation is made to fit in 4 Gbyte RAM. It is easy to extend the
SAWdoubler software to other lattices; it is publicly available under the GNU
LGPL license.Comment: 29 pages, 3 figure
Exact k-way sparse matrix partitioning
To minimize the communication in parallel sparse matrix-vector multiplication while maintaining load balance, we need to partition the sparse matrix optimally into k disjoint parts, which is an NP-complete problem. We present an exact algorithm based on the branch and bound (BB) method which partitions a matrix for any k, and we explore exact sparse matrix partitioning beyond bipartitioning. The algorithm has been implemented in a software package General Matrix Partitioner (GMP). We also present an integer linear programming (ILP) model for the same problem, based on a hypergraph formulation. We used both methods to determine optimal 2,3,4-way partitionings for a subset of small matrices from the SuiteSparse Matrix Collection. For k=2, BB outperforms ILP, whereas for larger k, ILP is superior. We used the results found by these exact methods for k=4 to analyse the performance of recursive bipartitioning (RB) with exact bipartitioning. For 46 matrices of the 89 matrices in our test set of matrices with less than 250 nonzeros, the communication volume determined by RB was optimal. For the other matrices, RB is able to find 4-way partitionings with communication volume close to the optimal volume
Partitioning a call graph
Splitting a large software system into smaller and more manageable units has become an important problem for many organizations. The basic structure of a software system is given by a directed graph with vertices representing the programs of the system and arcs representing calls from one program to another. Generating a good partitioning into smaller modules becomes a minimization problem for the number of programs being called by external programs. First, we formulate an equivalent integer linear programming problem with 0–1 variables. theoretically, with this approach the problem can be solved to optimality, but this becomes very costly with increasing size of the software system. Second, we formulate the problem as a hypergraph partitioning problem. This is a heuristic method using a multilevel strategy, but it turns out to be very fast and to deliver solutions that are close to optimal
Parallel Sparse LU Decomposition on a Mesh Network of Transputers
A parallel algorithm is presented for the LU decomposition of a general sparse matrix on a distributed-memory MIMD multiprocessor with a square mesh communication network. In the algorithm, matrix elements are assigned to processors according to the grid distribution. Each processor represents the nonzero elements of its part of the matrix by a local, ordered, two-dimensional linked-list data structure. The complexity of important operations on this data structure and on several others is analysed. At each step of the algorithm, a parallel search for a set of m compatible pivot elements is performed. The Markowitz counts of the pivot elements are close to minimum, to preserve the sparsity of the matrix. The pivot elements also satisfy a threshold criterion, to ensure numerical stability. The compatibility of the m pivots enables the simultaneous elimination of m pivot rows and m pivot columns in a rank-m update of the reduced matrix. Experimental results on a network of 400 transputers are presented for a set of test matrices from the Harwell–Boeing sparse matrix collection
Open Problems in (Hyper)Graph Decomposition
Large networks are useful in a wide range of applications. Sometimes problem
instances are composed of billions of entities. Decomposing and analyzing these
structures helps us gain new insights about our surroundings. Even if the final
application concerns a different problem (such as traversal, finding paths,
trees, and flows), decomposing large graphs is often an important subproblem
for complexity reduction or parallelization. This report is a summary of
discussions that happened at Dagstuhl seminar 23331 on "Recent Trends in Graph
Decomposition" and presents currently open problems and future directions in
the area of (hyper)graph decomposition
Sparse Matrix Computations on Bulk Synchronous Parallel Computers
The Bulk Synchronous Parallel BSP programming model is studied in the context of sparse matrix computations. As a case study a BSP algorithm is developed for sparse Cholesky factorisation
A two-dimensional data distribution method for parallel sparse matrix-vector multiplication
Abstract. A new method is presented for distributing data in sparse matrix-vector multiplication. The method is two-dimensional, tries to minimise the true communication volume, and also tries to spread the computation and communication work evenly over the processors. The method starts with a recursive bipartitioning of the sparse matrix, each time splitting a rectangular matrix into two parts with a nearly equal number of nonzeros. The communication volume caused by the split is minimised. After the matrix partitioning, the input and output vectors are partitioned with the objective of minimising the maximum communication volume per processor. Experimental results of our implementation, Mondriaan, for a set of sparse test matrices show a reduction in communication compared to one-dimensional methods, and in general a good balance in the communication work
A two-dimensional data distribution method for parallel sparse matrix-vector multiplication
Abstract. A new method is presented for distributing data in sparse matrix-vector multiplication. The method is two-dimensional, tries to minimize the true communication volume, and also tries to spread the computation and communication work evenly over the processors. The method starts with a recursive bipartitioning of the sparse matrix, each time splitting a rectangular matrix into two parts with a nearly equal number of nonzeros. The communication volume caused by the split is minimized. After the matrix partitioning, the input and output vectors are partitioned with the objective of minimizing the maximum communication volume per processor. Experimental results of our implementation, Mondriaan, for a set of sparse test matrices show a reduction in communication volume compared to one-dimensional methods, and in general a good balance in the communication work. Experimental timings of an actual parallel sparse matrix-vector multiplication on an SGI Origin 3800 computer show that a sufficiently large reduction in communication volume leads to savings in execution time
An Improved Algorithm for Parallel Sparse LU Decomposition on a Distributed-Memory Multiprocessor
In this paper we present a new parallel algorithm for the LU decomposition of a general sparse matrix. Among its features are matrix redistribution at regular intervals and a dynamic pivot search strategy that adapts itself to the number of pivots produced. Experimental results obtained on a network of 400 transputers show that these features considerably improve the performance. 1 Introduction This paper presents an improved version of the parallel algorithm for the LU decomposition of a general sparse matrix developed by van der Stappen, Bisseling, and van de Vorst [9]. The LU decomposition of a matrix A = (A ij ; 0 i; j ! n) produces a unit lower triangular matrix L, an upper triangular matrix U , a row permutation vector Ăź and a column permutation vector ae, such that A Ăź i ;ae j = (LU) ij ; for 0 i; j ! n: (1) We assume that A is sparse and nonsingular and that it has an arbitrary pattern of nonzeros, with all elements having the same (small) probability of being nonzero. A re..
- …